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Description 

BACKGROUND OF THE INVENTION 
Field of the Invention 

[0001] The present invention broadly relates to 
speech recognition and, more particularly, to a method 
and an apparatus for recognizing speech information 
based on prediction concerning an object to be recog- 
nized. The invention also relates to a storage medium 
for storing a program implementing the above method. 

Description of the Related Art 

[0002] Speech recognition is primarily divided into two 
types of methods, i.e., a word speech-recognition meth- 
od and a clause speech-recognition method. According 
to the word speech-recognition method, an input speech 
waveform is analyzed, and features are extracted from 
the waveform to produce a feature time series. Then, 
the similarity of the features in relation to the word dic- 
tionary represented by the feature time series which has 
been similarly obtained is calculated, and the calculated 
word is output as a recognition result. In the clause 
speech-recognition method, input speech is converted 
into phoneme strings, which are substituted by word 
strings. The word strings are then parsed, and are con- 
verted into character strings. Logic analyses and se- 
mantic analyses are then made on the characterstrings, 
so that a sentence is produced and output. Further re- 
search is being conducted on a method of providing 
word class information for homonyms, and a method of 
converting input speech into compound nouns or into a 
single clause. It Is however very difficult to implement 
such methods. 

[0003] In most cases, during conversation, humans 
recognize the speaker's voice by understanding it as 
one meaning. While the speaker is speaking, the listen- 
er supports his/her understanding by predicting the con- 
tent of the speech to some degree according to the pre- 
vious topic and common sense. Consequently, even if 
the speaker wrongly selects or pronounces some 
words, the listener understand him/her without any 
problem. Even if there are many homonyms in a con- 
versation, the listener can determine which word the 
speaker means. 

[0004] In contrast, conventional speech recognition 
systems perform speech recognition according to pat- 
tern matching. More specifically, a dictionary provided 
for a system is searched for possible words which match 
a certain portion of an input speech waveform, and the 
searched words are output. Among the output words, 
the optimal word is selected. With this arrangement, if 
speech recognition fails while it is being conducted, the 
subsequent processing is spoilt. 

[0005] Additionally, in most conventional speech rec- 
ognition systems, it is assumed that input speech to be 



recognized satisfies the syntax of a certain language. 
Thus, various determinations are made in a speech rec- 
ognition module, and the determination result is trans- 
ferred to another process (another module). More spe- 

s cifically, in a speech recognition module, speech infor- 
mation is uniquely determined as a system command 
by being filtered (parsed). Not only processing for gram- 
matically correct speech, but also processing for unnec- 
essary words, such as exclamations and restated 

10 words, and for non-grammatical speech, such as anas- 
trophy (inversion) and particle dropping is handled by 
language processing (verifying such words against a 
word database or a grammar database). 
[0006] However, since parsing is performed in order 

'5 to analyze the structure of syntax, elements other than 
syntax information are rejected. Even if a word is deter- 
mined to be a significant word after parsing, general 
knowledge or knowledge of a specific field is not con- 
sidered. 

20 [0007] An example of conventional speech recogni- 
tion systems is shown in Fig. 42. Since the flow of 
processing executed on input speech is unidirectional, 
the system processing Continues to proceed in the 
same direction even if the processing result of a speech 

25 recognition module is incorrect. For example, an input 
that is determined to be syntactically correct but cannot 
be processed by the entire system upon performing 
speech recognition is disadvantageously received, and 
returns as an error. That is, a speech recognition unit 

30 and the whole system separately perform processing 
without operating together, thereby failing to implement 
complicated processing. As a consequence, the per- 
formance of the entire system is seriously influenced by 
the result of speech recognition. 

35 [0008] JP-A-05080793 describes an interactive un- 
derstanding device which includes a speech recognition 
device and a word predictor which operates to constrain 
a vocabulary used by the speech recognition device in 
order to improve recognition efficiency. 

SUMMARY OF THE INVENTION 

[0009] According to one aspect, the invention pro- 
vides a speech information processing apparatus com- 
45 prising: 

a context base for storing prediction information 
representing a category of words to be subsequent- 
ly recognised; 

5" recognition means for recognising a word based on 
the category of words represented by the prediction 
information stored in said context base; 
a knowledge base for storing knowledge concern- 
ing a category of speech Information; 

55 prediction means for predicting the category of 
words to be subsequently recognised based on at 
least one previously recognised word by referring 
to the knowledge stored in said knowledge base; 
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updating means for updating the prediction informa- 
tion stored in said context base based on the cate- 
gory of words obtained by said prediction means; 

5 

characterised in that: 

the apparatus further comprises determination 
means for determining whether or not input sound infor- 
mation is language information by referring to the knowl- 
edge stored in said knowledge base; and in that 10 

said recognition means is arranged to perform the 
recognition on the input sound information when said 
determination means determines that the input sound 
information is language information. 

[0010] According to another aspect, the present in- is 
vention provides a speech information processing meth- 
od comprising: 

a recognition step of recognising a word based on 

a category of words to be subseq uently recognised so 

which is represented by prediction information 

a prediction step of predicting the category of words 
to be subsequently recognised based on at least 
one previously recognised word by referring to zs 
knowledge concerning a category of speech infor- 
mation stored in a knowledge base concerning a 
category of speech information; and 
an updating step of updating the prediction informa- 
tion stored in said context base based on the cate- 30 
gory of words obtained in said prediction step; 

characterised by a determination step of determin- 
ing whether or not input sound information is language 
information by referring to the knowledge stored in said 35 
knowledge base; and in that said recognition step per- 
forms the recognition on the input sound information 
when the input sound information is determined to be 
language information in said determination step. 
[0011] Other objects and advantages besides those io 
discussed above shall be apparent to those skilled in 
the art from the description of a preferred embodiment 
of the invention which follows. In the description, refer- 
ence is made to accompanying drawings, which form a 
part thereof, and which illustrate an example of the in- 45 
vention. Such example, however, is not exhaustive of 
the various embodiments of the Invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0012] 

Fig. 1 illustrates the configuration of the hardware 
of a natural-language processing apparatus ac- 
cording to an embodiment of the present invention; ss 
Fig. 2 illustrates a system architecture; 
Fig. 3 illustrates an implementation mode in speech 
recognition; 



Fig. 4 illustrates the outline of understanding 

speech issued by humans; 

Fig. 5 illustrates input processing; 

Fig. 6 illustrates the configuration of a system; 

Fig. 7 illustrates a schematic flow of the system 

processing; 

Fig. 8 is a flow chart schematically illustrating the 
processing performed by the entire apparatus; 
Fig. 9 is a flow chart illustrating analyzing procedure 
of the process result; 

Fig. 10 is a flow chart illustrating the recognition 
processing reflecting prediction information; 
Fig. 11 is a flow chart illustrating the flow of speech 
recognition processing; 

Fig. 12 is a flow chart illustrating the flow of deter- 
mining the type of input sound; 
Fig. 13 illustrates two prediction techniques; 
Figs. 14 and 15 illustrate the classification of the cat- 
egories of words; 

Fig. 16 is a flow chart illustrating the procedure of 
setting the initial prediction; 

Fig. 1 7 is a flow chart illustrating the word-recogni- 
tion processing; 

Fig. 1 8 is a flow chart illustrating the syllable-recog- 
nition processing; 

Fig. 1 9 is a flow chart illustrating matching process- 
ing between the syllable recognition result and the 
word provided for the system; 
Figs. 20 and 21 are flow charts illustrating the 
processing of determining the similarity of syllables; 
Fig. 22 is a flow chart illustrating the processing of 
calculating the similarity of the corresponding word 
by utilizing the similarity of syllables and the recog- 
nition time; 

Fig. 23 is a flow chart illustrating the indication 
processing; 

Fig. 24 illustrates the parameter-setting/result-indi- 
cation screen; 

Fig. 25 illustrates an example of a syllable diction- 
Fig. 26 illustrates an example of a word dictionary; 
Fig. 27 illustrates the state transition of the context 
for a prediction of a subsequent input; 
Fig. 28 is a flow chart illustrating the processing of 
generating a response to the user; 
Fig. 29 illustrates an example of a language diction- 
ary; 

Fig. 30 illustrates an example of a concept diction- 
Fig. 31 illustrates an example of rules; 
Fig. 32 illustrates an example of a word dictionary; 
Fig. 33 illustrates an example of syllable recognition 
results; 

Fig. 34 illustrates a dynamic programming (DP) 
matching algorithm; 

Fig. 35 illustrates an example of a word dictionary; 
Fig. 36 illustrates an example of syllable recognition 
results; 
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Fig. 37 is a flow chart illustrating the processing of 
determining a recognition result and of determining 
whether the recognition result is to be accepted; 
Fig. 38 is a flow chart illustrating the recognition- 
result analyzing processing; 
Fig. 39 is a flow chart illustrating a concept analysis 
and the processing of determining an analysis re- 
sult; 

Fig. 40 is a flow chart illustrating the result correc- 
tion processing; 

Fig. 41 is a flow chart illustrating the processing of 
re-determining the previous recognition result; and 
Fig. 42 illustrates conventional input procossing. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

[0013] Preferred embodiments of the present inven- 
tion are described in detail below with reference to the 
accompanying drawings. 

First Embodiment 

[0014] A detailed description is given below of a first 
embodiment of the present invention with reference to 
the drawings. 

[0015] A discussion is first given of the construction 
of the hardware for use In a natural-language process- 
ing apparatus according to the first embodiment of the 
present invention. Referring to the block diagram illus- 
trating the construction of the hardware shown in Fig. 1 , 
an input unit 101 inputs information through the use of 
a natural language. It is not essential that the input in- 
formation be a grammatically complete sentence as 
long as it has a regular structure. 

[0016] The input unit 101 is not limited to a speech 
recognition system for inputting and recognizing 
speech, and may be a keyboard for inputting characters 
through keys, a character recognition reader for optical- 
ly reading characters from a document and recognizing 
them, an online/offline handwritten-character recogni- 
tion reader, or a receiving unit for receiving information 
from another system, for example, from a character rec- 
ognition system. Alternatively, two of the above input 
units may be combined and selectively utilized as the 
input unit 101 . 

[001 7] A CPU 1 02 conducts calculations and Boolean 
operations for various processing, and controls the in- 
dividual elements connected to a bus 1 06. An output unit 
103 outputs analyzed data information, and may be a 
speech synthesizing unit for synthesizing speech from 
character information and outputting it, a display unit, 
such as a cathode ray tube (CRT) or a liquid crystal dis- 
play unit, a printerfor printing characters on a document, 
or a transmitter for transmitting information to another 
unit, such as a database. The output from the output un it 
103 may be input into another output unit within the 
same apparatus, for example into a concept analyzing 



unit. Alternatively, two of the above-described units may 
be combined and selectively utilized as the output unit 
103. 

[0018] A program memory 104 stores a program in- 
5 eluding the processing procedure controlled by the CPU 
102, which will be described below with reference to a 
flow chart. The program memory 1 04 may be a read only 
memory (ROM) or a random access memory (RAM) into 
which the program is loaded from an external storage 
10 device. 

[0019] A data memory 105 stores not only data gen- 
erated by various processing, but also a knowledge 
base, which will be discussed below. The data memory 
105 may be a RAM, but knowledge included in the 

15 knowledge base is loaded into the data memory 105 
from a non-volatile external storage medium before 
processing is executed, or is checked every time the 
need arises. The bus 106 is used for transmitting ad- 
dress signals which give an instruction to the individual 

20 elements controlled by the CPU 102 and for transferring 
data to be exchanged between the individual units. 
[0020] Fig. 2 is a block diagram illustrating the basic 
configuration of an information processing apparatus 
according to the first embodiment. The information 

25 processing apparatus performs processing by using 
knowledge of the knowledge base. Fig. 2 illustrates the 
flow of the processing executed by using this knowl- 
edge. 

[0021] The information processing apparatus in- 

30 eludes an input processing unit 201 for executing 
processing on the individual input signals so as to obtain 
input information. A context-construction/goal-infer- 
ence unit 202 conducts concept-analyses on the con- 
tent of natural-language information input from the Input 

35 unit 101 by utilizing the knowledge of a knowledge base 
208, thereby understanding the meaning. 
[0022] A planning unit 203 makes planning by using 
a context base 207 and the knowledge of the knowledge 
base 208 in order to achieve a goal inferred by the con- 

40 text-construction/goal-inference unit 202. 

[0023] An execution unit 204 requests, based on the 
processing result of the planning unit 203, a main appli- 
cation unit 205 to execute processing by using an appli- 
cation, a database, or a printer connected to a system. 

is The main application unit 205 then executes processing 
by using an application, database, or a printer connect- 
ed to the system. 

[0024] A response determining unit 206 receives the 
processing result of the execution unit 204 and deter- 
so mines a response to be output to a user. In this embod- 
iment, the response determining unit 206 analyzes the 
output by employing the context base 207 and the 
knowledge of the knowledge base 208 and generates a 
response if required, and finally selects a method for 
55 outputting the response. 

[0025] The context base 207 provides the knowledge 
required for the context -construction/goal-inference unit 
202, the planning unit 203, and the response determin- 
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ing unit 206, and also stores new knowledge generated 
while the above units are executing processing. 
[0026] The knowledge base 208 provides the knowl- 
edge required for the context-construction/goal-infer- 
ence unit 202, the planning unit 203, and the response 
determining unit 206, and also stores new knowledge 
produced while the above units are executing process- 
ing.. 

[0027] Fig. 3 illustrates the flow of the processing per- 
formed by the information processing apparatus of the 
first embodiment. An input recognition unit 301, which 
corresponds to the input processing unit 201 shown in 
Fig. 2, recognizes the input information. 
[0028] A concept analyzing unit 302, which corre- 
sponds to the context-construction/goal-inference unit 
202, the planning unit 203, and the execution unit 204, 
analyzes the meaning of the input information by utiliz- 
ing knowledge-base/context-base 306 contained in the 
system according to the recognition result of the input 
recognition unit 301 . Upon analyses, the concept ana- 
lyzing unit 302 predicts information to be subsequently 
input or requests a main application unit 303 to execute 
processing. 

[0029] The main application unit 303, which corre- 
sponds to the main application unit 205, executes 
processing requested by the concept analyzing unit 302 
and transfers the execution result to a response gener- 
ating unit 304. 

[0030] The response generating unit 304, which per- 
forms processing on the result of the response deter- 
mining unit 206, analyzes the execution result of the 
main application unit 303 and generates a response to 
be output to the user, and also selects the optimal output 
method. 

[0031] The response generating unit 304 requests an 
output synthesizing unit 305 to output the response. The 
output synthesizing unit 305 outputs the response gen- 
erated by the response generating unit 304 according 
to the selected method. The knowledge-base/context- 
base 306 of the system is used for performing process- 
ing by the response generating unit 304 and the output 
synthesizing unit 305. 

[0032] By applying the construction of the information 
processing apparatus of the first embodiment to speech 
recognition, the advantages of human speech recogni- 
tion processing are implemented in this apparatus. An 
example of a mechanism of recognizing speech issued 
by humans is given below. In this example, it is assumed 
that input speech "Send mail to May" is processed. 
[0033] Fig. 4 illustrates the outline of understanding 
the speech "Send mail to May" by humans. In most cas- 
es, humans recognize the speech by understanding it 
as one meaning rather than by sequentially selecting 
possible words similar to a certain portion of the input 
speech waveform, as performed in current speech rec- 
ognition systems. This is because humans recognize 
and understand speech not only by speech information, 
but also by predicting the context used before and after 



the speech and common sense to some extent. 
[0034] In order to implement the humans' recognition 
operation in a system, predictions may be made on the 
input information in advance. More specifically, when 
s "Send mail to Mayl" is input as speech, the following pre- 
dictions are made. Upon recognizing the word "send", 
the subsequent object is predicted by using the lan- 
guage knowledge, and a word "mail" is further predicted 
by using the domain knowledge. 

10 [0035] Generally, in speech recognition, possible 
words "male", "mai", "may", and "mate" may be 
searched for. Among these words, "mai" may be pre- 
dicted as a personal name from the language knowl- 
edge, which is unlikely to be contained in an ordinary 

's dictionary. However, since personal names are not likely 
to come immediately after "send", "mai" is rejected. Al- 
so, a stereotyped phrase "send to" is predicted from the 
language knowledge, and "mate" is not likely to be se- 
lected. Further, "too", which is a homonym of "to", Is not 

20 predicted from the knowledge base. Finally, it is predict- 
ed from the concept knowledge that XXX in "send to 
XXX" may be an object, and from the domain knowledge 
that the destination of "send to" may be a human (per- 
sonal name). It is thus considered that "May" be predict- 

25 ed from an address book or a biographical dictionary. 
[0036] A comparison is then made between the 
speech recognition processing shown in Fig. 4 and the 
processing performed by the conventional speech rec- 
ognition system shown in Fig. 42. 

so [0037] According to conventional input processing 
methods, various determinations are generally made in 
a recognition module, and the result is transferred to an- 
other module. The input information is recognized by ex- 
ecuting the recognition processing and is shaped into a 

35 form receivable by an application. The flow of the 
processing is unidirectional, and the individual units 
separately perform the processing ratherthan operating 
together. 

[0038] Particularly for the processing of input speech, 

■*o the following method is usually employed, as illustrated 
in Fig. 42. The result obtained by recognizing the speech 
in a speech recognition unit 4201 is uniquely determined 
as a system command by being filtered (parsed) in a 
speech processing unit 4202. Not only processing for 

"5 grammatically correct speech, but also processing for 
unnecessary words, such as exclamations and restated 
words, and for non-grammatical speech, such as anas- 
trophy (inversion) and particle dropping is handled by 
language processing (verifying such words against a 

so word database or a grammar database) in the speech 
processing unit 4202. Parsing is performed in order to 
analyze the structure of syntax, and accordingly, ele- 
ments otherthan syntax information, which may also be 
useful, are rejected. Additionally, the flow of the process- 

55 ing is unidirectional, and even if the processing result of 
the speech recognition module is incorrect, the system 
completes the processing performed by a speech input 
unit 4203 and proceeds to a subsequent stage, i.e., an 
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application unit 4207. Processing is similarly performed 
in a keyboard input and an image input unit 4206 (an 
optical character reader (OCR) 4204 and an Image 
processing unit 4205). 

[0039] According to the aforementioned method, 
even an input which cannot be processed by the entire 
system is accepted, and is returned as an error from the 
application unit 4207. That is, the processing on speech 
and images performed by the input units 4203 and 4206 
is not operated together with the processing of the entire 
system, thereby implementing merely a simple opera- 
tion. As a result, the performance of the entire system 
is seriously influenced by the result of speech recogni- 

[0040] In contrast, an input processing method ac- 
cording to the first embodiment is shown in Fig. 5. Input 
information, if it indicates sound, is recognized in a 
speech recognition unit 501 , and if it represents an im- 
age, is recognized in an OCR 502. The result is then 
analyzed in an analyzing unit 503 based on common 
sense and knowledge, and a subsequent input is further 
predicted or the result of analyses Is transferred to an 
application unit 504 of the system. 
[0041] Particularly for processing input speech, the 
speech recognition is ideally performed by comprehen- 
sively using speech information and other knowledge 
rather than solely performing speech recognition as in 
a conventional manner. The result obtained by process- 
ing speech is stored as the knowledge of the whole sys- 
tem, and is used together with the other knowledge in- 
cluded in the system, thereby making it possible to rec- 
ognize the meaning of the speech rather than the struc- 
ture of the speech. That is, according to the flow of the 
processing indicated by the arrows 505 and 506 shown 
in Fig. 5, the results of analyses are fed back to the 
speech recognition unit 501 and the OCR 502 so that 
the recognition result and the analyses results are op- 
erated together, thereby improving the performance of 
the input processing. According to the flow of the 
processing indicated by the arrows 5.07 and 508 illus- 
trated in Fig. 5, the analysis result and the processing 
of the application unit 504 are operated together, there- 
by enhancing the performance of execution processing. 
As a consequence, the performance of the entire sys- 
tem can be improved. 

[0042] In this embodiment, the input processing illus- 
trated in Fig. 5 and the system architecture shown in 
Fig. 2 are implemented. As a result, processing similar 
to the speech recognition processing performed by hu- 
mans can be achieved. More specifically, Fig. 6 is an 
overall diagram illustrating a speech recognition system 
constructed in accordance with an implementing meth- 
od, such as that shown in Fig. 3. 
[0043] When speech is input, speech recognition is 
conducted by a speech recognition unit 601 according 
to the previously made predictions and information con- 
tained in a knowledge-base/context-base 606. For the 
processing of speech information, the knowledge-base/ 



context-base 606 includes not only common knowledge 
effective for the processing regardless of the type of 
knowledge, but also knowledge concerning speech in- 
formation. The concept of the recognition result is ana- 
5 lyzed by a concept analyzing unit 602 by utilizing the 
common sense and the knowledge of the system con- 
tained in the knowledge-base/context-base 606, there- 
by analyzing the meaning of the recognition result. 
[0044] A main application unit 603 predicts a subse- 
10 quent speech Input or performs processing according to 
the purpose. Upon executing the processing by the main 
application unit 603, a response to the user may be re- 
quired, in which case, a response is generated in a re- 
sponse generating unit 604. If it is determined that a re- 
's sponse is most suitably given to the user in speech, a 
response is converted into speech in a speech synthe- 
sizing unit 605 and is output. The knowledge-base/con- 
text-base 606 of the system is also used for the above 
processing. 

20 [0045] There are primarily two techniques for predict- 
ing speech, and the details are shown in Fig. 13. Pre- 
dictions may be made at two stages. When speech is 
recognized, a subsequent input signal may be predict- 
ed. Alternatively, when the output result of a speech rec- 

25 ognition engine is used for internal processing, a result 
to be subsequently input may be predicted. 
[0046] According to the first technique, a word to be 
subsequently input is predicted from previously input 
words and common sense by utilizing a knowledge 

30 base. Speech (phonemes or syllables) to be subse- 
quently input is further predicted from the predicted 
word, and is utilized for enhancing the speech recogni- 
tion rate. According to the second technique, a word to 
be subsequently input is also predicted from previously 

35 input speech and common sense by utilizing the knowl- 
, edge base, and is used for smoothly performing subse- 
quent processing. 

[0047] For example, if a domain represents an appa- 
ratus for transmitting documents or mail, the state tran- 
ce sition of the context illustrated in Fig. 27 is predicted. An 
initial prediction is made in the apparatus as follows. 
First, a prediction is made from the general knowledge, 
such as "a usermay takesome action in order to operate 
the apparatus", that a verb is likely to be input. Then, 
45 verbs that may be accepted by this apparatus are cate- 
gorized as Act, and the apparatus waits for an input of 
speech by predicting that a verb belonging to the cate- 
gory Act is to be input. 

[0048] After recognizing the input of a verb belonging 
so to the category Act, the state transition of the prediction 
category occurs. That is, a prediction is then made on 
speech belonging to a category Object. The classifica- 
tion of the categories, such as Act and Object, is shown 
in, for example, Figs. 1 4 and 1 5. For example, the cat- 
55 egory Object handled in this apparatus includes mail, 
document, etc. 

[0049] Fig. 8 is a flow chart schematically illustrating 
the processing performed by the entire apparatus. A 



6 



EP0 977 175 B1 



12 



subsequent object to be recognized is predicted based 
on a knowledge base storing information concerning 
knowledge. 

[0050] In step S800, the system is started. Then, in 
step S801, an initial prediction is set. Fig. 16 is a flow 
chart illustrating the procedure of setting the initial pre- 
diction. In setting the initial prediction, since information 
to be recognized has not yet been input, a subsequent 
operation is predicted based on the previous operation, 
and input information is predicted based on the predict- 
ed operation. 

[0051] In step S1601, the previous operation is ob- 
tained by referring to the previous processing state of 
the system or the content of the user's request. If it is 
found in step S1602 that the previous operation cannot 
be obtained due to the absence of a previous operation, 
which state is encountered immediately afterthe system 
has just started, the flow proceeds to step 51608 in 
which an initial prediction is set in the apparatus. In this 
flow chart, it is determined in step S1608 that the user 
must request the apparatus to take some Action, and 
verbs are activated as a context to be recognized by the 
speech recognition system. 

[0052] On the other hand, if it is determined in step 

51602 that the previous operation has been success- 
fully obtained, the flow proceeds to step S1 603 in which 
the operations related to the previous operation are 
checked by referring to the general knowledge or the 
domain knowledge related to the apparatus contained 
in the knowledge base. Then, in step S1 604, among the 
operations checked in step S1603, the operation which 
Is most likely to be performed is predicted.- A determina- 
tion is then made in step S1 605 whether the subsequent 
operation has been successfully predicted. If the out- 
come of step S1605 is yes, the flow proceeds to step 
S1606. In step S1 606, information related to the predict- 
ed operation is acquired from the knowledge base, and 
in step S1 607, information to be input is predicted based 
on the information acquired in step S1606. 

[0053] For example, if the previous operation is "print 
three copies of a document", the operations related to 
the "printing operation", such as "check printstatus" and 
"print another copy", are checked in step S1603 from 
the knowledge base. Then, in step S1 604, it can be pre- 
dicted from the previous operation "print three copies" 
that "check print status" is more likely to be input than 
"print another copy". In step S1606, by referring to the 
domain knowledge from the operation "check print sta- 
tus", the related information, such as a keyword, for ex- 
ample, "printer status" can be obtained. By using the ob- 
tained information, subsequent input information is pre- 
dicted in step S1607. 

[0054] If the previous operation is "delete all the jobs", 
the subsequent operation cannot be obtained in step 

51603 and S1604, and a determination of step S1605 
becomes no. Then, an initial prediction is set in step 
S1608. 

[0055] Referring back to Fig. 8, input information is 



acquired in step S802, and it is determined in step S803 
whether the input information is valid. If it is valid, the 
process proceeds to step S804 in which the type of input 
information is determined. The conceivable types of in- 

5 formation may be speech, characters, and images. In 
step S805, the input information is then recognized 
based on the prediction according to the type of infor- 
mation determined in step S804. More specifically, in 
step S805, the input information is recognized In the fol- 

10 lowing manner. It is first checked whether the informa- 
tion is language information or non-language informa- 
tion, and if it is language information, the unit of infor- 
mation, such as a syllable or a phoneme, is determined. 
[0056] The recognized result in step S805 is used for 

'5 predicting subsequent information to be recognized 
while analyzing the process result in step S806. Fig. 9 
is a flow chart illustrating the analyzing procedure of the 
process result. It is considered that a prediction (step 
S904) Is made not only from the recognition result, but 

so also from an analysis of the recognition result (step 
S901), a determination of the correctness of the result 
(step S902), and a correction of the result (step S903). 
It is not essential that all the processing in steps S901 
through S903 be performed. Alternatively, only an anal- 

25 ysis of the recognition result or only a correction of the 
result may be performed. Alternatively, a combination of 
a determination of the correctness of the result and a 
correction of the result, or a combination of an analysis 
of the recognition result and a determination of the cor- 

30 rectness of the result may be performed. In step S905, 
according to the prediction generated in step S904, the 
prediction is updated for subsequent information to be 
recognized. 

[0057] In the whole block of analyzing the recognition 
35 result in step S806, processing is executed by referring 
to various types of knowledge. For example, in utilizing 
the language knowledge, a general dictionary may be 
used. Thus, even if "flint" is recognized as an input signal 
waveform, "print" can be determined upon performing 
40 the analyzing processing, since "flint" cannot be found 
in the general dictionary. 

[0058] In utilizing the domain knowledge, assuming 
that the domain is an application provided with a mail 
sending function, it can be predicted that "mail" is more 

*s likely to be input than "male". Also, by utilizing the com- 
mon sense (general knowledge), if, for example, printing 
was performed as the previous command, it can be pre- 
dicted that the subsequent operation may be to indicate 
the printer status. 

so [0059] After advancing the process by using the 
knowledge, it is determined in step S807 whether the 
system should execute the processing. For example, 
upon completing the recognition of an input sentence, if 
the recognized sentence indicates an instruction to the 

55 system to execute the processing, it is determined in 
step S807 that the corresponding processing should be 
executed. Then, in step S808, the processing is execut- 
ed, and in step S809, a response Is provided to the user 
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if necessary. If it is determined in step S807 that another 
input is to be made, the flow returns to step S802 since 
the prediction forthe subsequent input has already been 
updated. In step S802, the subsequent input is obtained. 
Upon completing a series of processing, a determina- 
tion is made in step S810 as to whether the system is 
to be closed or to be continued. If the system continues 
to be used, the process returns to step S801 in which 
an initial prediction is set based on the completed oper- 
ation. If the system is to be closed, it is shut down in step 
S811. 

[0060] It is now considered in what procedure input 
speech is actually processed in the speech recognition 
system of this first embodiment according to the above- 
described processing illustrated in Fig. 8. The schematic 
flow of the process of the system in response to user's 
speech is shown in Fig. 7. 

[0061] It is first predicted in this type of system that 
the user may take some action, and the system waits 
for a verb to be input into the speech recognition unit, 
since action is usually started with a verb. For example, 
when "send" is input, it is predicted that an object of 
"send" is to be input, and the system thus waits for an 
object to be input. 

[0062] In this manner, the system understands the 
words while predicting a word to be subsequently input. 
If it is determined by conducting concept analyses that 
the execution is possible, the corresponding processing 
Is actually executed. When It is determined upon the ex- 
ecution result that a response should be returned to the 
user, a suitable response is generated, and the corre- 
sponding sentence is created. The sentence is then out- 
put to the user according to the optimal method. If the 
optimal method is to synthesize speech, the created 
sentence is converted into speech, which is then output 
to the user. For example, if mail has been successfully 
sent to May, a response "I sent mail to may@xxx, Suc- 
cessfully!", is returned. 

[0063] The processing procedure of the information 
processing apparatus of this embodiment is discussed 
below through a specific example. 
[0064] In this example, input speech "Send mail to 
May" is recognized by setting a prediction, and the cor- 
responding processing is appropriately performed by 
conducting concept analyses. It Is now assumed that the 
domain of the apparatus of this embodiment is to send 
mail or documents by using electronic mail. 
[0065] The system is started in step S800, and an in- 
itial prediction is then set in step S801 for waiting for 
information to be input from the user. It is determined in 
step S801 that the user must request the apparatus to 
take some action, and then, verbs are activated as a 
context to be recognized by the speech recognition unit. 
In step S802, the information input by the user Is ac- 
quired, in which case, the speech issued by the user is 
recognized, and the speech information is obtained. 
[0066] A determination is then made in step S803 of 
whether the speech information is valid in relation to the 
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reference, such as the sound level. If the input is invalid, 
the system waits for a valid input. If the input is valid, 
the type of input, in this case, speech, is determined in 
step S804. 

s [0067] Then, in the recognition processingof the input 
information in step S805, speech recognition process- 
ing shown in Fig. 10 is specifically performed. In step 
S1001, speech recognition is first conducted. The de- 
tails are given below with reference to the flow chart of 
"? Fig. 11 . Upon inputting speech, sound information is first 
processed in step S1101 . Then, it is determined in step 
S1108 whether the sound information is language infor- 
mation or non-language information. This determination 
may be made by using a language-information data- 
's base and a non-language-information database of the 
knowledge base or by checking the frequency of the 
sound information. 

[0068] A specific example of this determination in step 
S1108 is shown in Fig. 12. In step S1201, a reference 
20 frequency range, which is set in the apparatus, for de- 
termining whether input sound has been issued by a hu- 
man is obtained. If it is determined in step S1202 based 
on the reference frequency range obtained in step 

51201 that the input sound has been issued by a hu- 
25 man, the process proceeds to step S1208. In step 

S1 208, even if the input sound is included in the frequen- 
cy range of human speech, information concerning ele- 
ments other than general language speech, for exam- 
ple, a laughing voice and redundant words, i.e., infor- 

30 mation concerning "specific sound", which is registered 
as the knowledge of the apparatus, is acquired. 
[0069] Thereafter, a determination is made in step 
S1209 as to whether the current input is specific sound. 
If the outcome of step S1209 is yes, the type of input is 

35 determined in step S1 21 0, and a flag is set in step S 1 21 1 
to indicate that the input sound is non-language infor- 
mation. If it is determined in step S1209 that the current 
input is not specific sound a flag is set in step S1 21 2 to 
indicate that the input speech is language information. 

40 [0070] On the other hand, if it is determined in step 

51202 that the input sound is outside the frequency 
range of human speech, a flag is set in step S1203 to 
indicate that the input sound is non-language informa- 
tion. In step S1204, information concerning the sound 

45 which should be specifically processed among the non- 
language information is acquired. It is then determined 
in step S1 205 whether the input sound among the non- 
language information should be specifically processed. 
If the result of step S1205 is yes, the type of sound is 

so obtained and set in step S1206. In contrast, if it is found 
in step S1 205 that the input sound does not have to be 
specifically processed, the type of input sound is set to 
be noise in step S1207. 

[0071 ] According to the determi nation process as de- 
55 scribed above, the type of speech is determined in step 
S1102 of Fig. 11. If the input speech is "Send mail to 
May", the type of speech is determined to be language 
information. Then, the unit of the language information 
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is further determined in step S1 1 06. 
[0072] If the speed of the user's speech is fast, the 
unit of the language information is determined to be a 
longer unit, for example, a word rather than a phoneme. 
If a history indicates that the recognition of words has s 
not been very successful, a shorter unit, for example, a 
phoneme, may be used as long as it is determined that 
speech recognition can be performed with the highest 
accuracy by using units of phonemes. This determina- 
tion may be automatically made by the apparatus so as >o 
to improve the recognition rate. Alternatively, if it is de- 
sired by the user that recognition be made in units of 
words, the user may set the unit for recognition. 
[0073] If it is determined in step S1106 that the unit of 
the language information is, for example, a word, the is 
word-recognition processing is performed in step 
S1103. The flow chart ofthis processing is shown in Fig. 
17. In step S1701, speech recognition is performed In 
units of words by using the speech recognition unit. 
More specifically, in response to the input "Send mail to 20 
May", the word recognition is implemented by selecting 
the word "send" closest to the input sound information 
from the word dictionary provided for the speech recog- 
nition unit. In step S1702, the word "send" determined 
in the word recognition in step S1 701 is obtained togeth- 25 
er with the similarity between "send" contained in the 
dictionary and the input speech (for example, 92% sim- 
ilarity). It is then determined in step S1 703 whether the 
result of the word recognition is to be used. 
[0074] Referring back to Fig. 1 1 , if it is determined in so 
step S1 1 06 that the type of speech is a syllable, syllable- 
recognition processing is executed in step S1104. The 
flow chart of this processing is shown in Fig. 18. In step 
S1 801 , speech recognition is conducted in units of syl- 
lables by using the speech recognition unit. In this ex- ss 
ample, the syllable recognition in units of syllables is im- 
plemented by selecting the top N syllables similar to the 
input sound information from the syllable dictionary pro- 
vided for the speech recognition unit. Instep S1802, the 
result of syllable recognition determined in step S1801 ■»» 
is acquired together with the similarity between the top 
N syllables and the information contained in the diction- 
ary. In step S1803, the syllable determined in step 
S1 802 is then recognized as the word "send" which can 
be handled by the system, and the similarity between 45 
the whole word and the input speech is output. The de- 
tails of this processing are indicated by the flow chart of 
Fig. 19. Upon performing recognition processing in units 
of syllables, "se" is acquired instep S1901 . Then, instep 
S1902, a suitable word that matches the syllable "se" is so 
determined by using the result obtained in step S1901 . 
It is also determined in step S1902 whether the result of 
the word recognition in step S1901 is to be employed. 
[0075] Referring back to Fig. 1 1 , if it is found in step 
S1106 that the type of speech is neither a word nor a ss 
syllable, the speech recognition suitable for the corre- 
sponding type is performed in step S1105. The type of 
speech which Is neither a word nor a syllable may be a 



phoneme, which is shorter than a syllable, or a stereo- 
typed phrase, such as a sentence, which is longer than 
a syllable. A plurality of types of units may be used for 
recognizing input information until a series of operations 
has been completed by the user. Alternatively, only one 
type of unit may be used. 

[0076] Referring back to Fig. 10, the result obtained 
upon recognizing the input information in step S1 001 is 
processed in step S1002. More specifically, it is deter- 
mined in step S1 703 of Fig. 1 7 and in step S1 902 of Fig. 
1 9 whether the recognition result is to be finally accept- 
ed as the input by the user. The detailed process is 
shown in Fig. 37. 

[0077] In step S3701 , processing for determining the 
recognition result is executed. More specifically, it is de- 
termined in step S3701 whether the recognition result 
is to be accepted, for example, by providing a threshold 
for the similarity of the speech recognition. It is now as- 
sumed that the threshold similarity is set to be 80%. If 
the recognition result is "send: 85% similarity", it is de- 
termined in step S3702 that the recognition result is to 
be accepted, and it is notified in step S3703 that the rec- 
ognition result "send" has been accepted. Conversely, 
if the recognition result is "send: 70% similarity", it is de- 
termined in step S3702 that the recognition result is to 
be rejected, and it is reported in step S3704 that the rec- 
ognition result has been rejected, so that a subsequent 
user's input is ready to be processed. 
[0078] Referring back to Fig. 8, after recognizing var- 
ious types of input information in step S805, the recog- 
nition result "send" is analyzed in step S806. In step 
S806, the processing is executed by primarily analyzing 
the concept of the word "send". The flow of this process- 
ing is schematically shown in Fig. 9. The recognition re- 
sult is used for predicting a subsequent object to be rec- 
ognized in step S904. The prediction for the subsequent 
object may be made in step S904 by using not only the 
recognition result, but also an analysis of the recognition 
result (step S901), a determination of the correctness of 
the result (step S902), and a correction of the result 
(step S903). It Is not essential that all the processing in 
steps S901 through S903 be performed. Alternatively, 
only an analysis of the recognition result or only a cor- 
rection of the result may be performed. Alternatively, a 
combination of a determination of the correctness of the 
result and a correction of the result, or a combination of 
an analysis of the recognition result and a determination 
of the correctness of the result may be performed. In 
step S905, according to the prediction generated in step 
S904, the prediction is updated for subsequent informa- 
tion to be recognized. 

[0079] An analysis of the recognition result in step 
S806 is more specifically shown by the flow chart of Fig. 
38. In step S3801, a category search is conducted on 
the recognition result "send" so as to acquire informa- 
tion, such as the attribute of the word "send" and the 
context of the currently activated words. Subsequently, 
in step S3B02, the language knowledge of the word 
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"send" is checked from the language dictionary, such as 
that shown in Fig. 29, and in step S3803, the concept 
knowledge is checked from the concept dictionary, such 
as that shown in Fig. 30, thereby obtaining the corre- 
sponding information. In step S3804, the operation and 
the object of the word "send" are then checked from the 
domain of the apparatus. Thereafter, in step S3805, the 
rules of the system concerning the input of the word 
"send" are checked from a rule table, such as that illus- 
trated in Fig. 31 . As a consequence, in step S3806, se- 
mantic analyses are comprehensively conducted by 
employing the above knowledge. 
[0080] More specifically, the semantic analyses in 
step S3806 are implemented by conducting concept 
analyses in step S3901 of Fig. 39 by utilizing the knowl- 
edge concerning the word "send" obtained in steps 
S3802 through S3805. In step S3802, by referring to the 
language dictionary, such as that shown in Fig. 29, it is 
found that "send" is a verb, which is followed by an ob- 
ject or the name of an apparatus. In step S3803, by re- 
ferring to the concept dictionary, such as that shown in 
Fig. 30, it is found that "send" represents a physical 
transfer. In step S3805, by checking the rule table, such 
as that shown in Fig. 31 , it is found that the object of 
"send" is mail or a document. Upon conducting the con- 
cept analyses in step S3901, it is determined in step 
S3902 whether the recognition result "send" obtained 
through speech recognition is correct in terms of seman- 
tics and common sense. A determination is also made 
in step S3902 as to whetherthe recognition result "send" 
satisfies the prediction set in response to the current In- 
put. 

[0081] If the domain is represented by a printer, the 
verb "send" may be included in the initial prediction. It 
is thus determined in step S3902 that the verb "send" 
satisfies the initial prediction. Then, in step S3903, a pre- 
diction is made on a subsequent input, which is preced- 
ed by "send", by utilizing various types of knowledge in 
the system. In this case, it is predicted from the word 
"send" that the user is likely to specify "sending what", 
and that a subsequent input is likely to be an object. The 
prediction set as described above is updated in step 
S3904. 

[0082] More specifically, in step S3904, the word dic- 
tionary provided for the speech recognition unit may be 
updated. Then, upon the concept analyses, the 
processing to be executed by the system In accordance 
with the user's purpose is determined in step S3905. In 
this case, since the system cannot execute the process- 
ing merely by the word "send", it waits for a subsequent 
Information to be input according to the determination 
of the analysis result made in step S807. 
[0083] Upon receiving speech "mail" from the user, 
the process proceeds in a manner similar to the above 
process. It is now assumed that a recognition result 
"mall" instead of "mail" be returned after conducting the 
speech recognition processing based on the prediction 
that a subsequent input may be an object, i.e., a noun. 
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In the processing for determining the recognition result 
in step S3701 , the semantic analyses are conducted In 
step S3806 by using the knowledge checked in steps 
S3801 through S3805. Upon checking the domain 

5 knowledge in step S3804, it can be considered that 
"mall" is not suitable as the input word. Then, it is deter- 
mined in step S3902 that the result "mall" should be cor- 
rected. The flow then proceeds to step S3906. 
[0084] In order to re-determine the recognition result, 

'o a result re-determining processing is performed in step 
S4001 of Fig. 40. This processing is more specifically 
indicated by the flow chart of Fig. 41 . In step S4101 , the 
recognition result obtained so far is re-determined. If it 
is determined in step S4102 that the result should be 

is corrected, the prediction for the previous input is re-gen- 
erated in step S4103. In step S4104, the recognized 
word is re-determined. In this case, no corrections are 
made on the recognition result "send" since it is not nec- 
essary. Thus, a prediction forthe current input is re-gen- 

20 erated in step S4002 of Fig. 40 while the recognition re- 
sult "send" remains the same. The re-generated predic- 
tion is then updated in step S4003, and the current input 
is again recognized by utilizing another type of knowl- 
edge in step S4004. 

25 [0085] The updating of the prediction in step S4003 is 
to merge the newly generated prediction into the previ- 
ous prediction. Accordingly, even after performing this 
updating operation, the number of predictions is not in- 
creased, which would otherwise generate more match- 
so es. On the contrary, the prediction becomes more pre- 
cise to restrict the number of possible words. For exam- 
ple, the current prediction is updated in step S4003 by 
a prediction that "mail" is more likely to be input than 
"mall" by considering the domain of the system. Upon 

35 re-examining the current recognition result in step 
S4004, it is determined that "mail" is more suitable than 

[0086] As discussed above, upon completion of rec- 
ognizing the words "Send mail to May", it is determined 
40 in step S807 of Fig. 8 that the system should execute 
the processing. Then, In step S808, a command indicat- 
ing "Send mail to May" is executed. During execution, 
May's mail address is checked in an address book pro- 
vided for the system to ensure that mail is appropriately 
ts sent. After performing step S808, if it is determined that 
a response should be returned to the user, output 
processing is executed in step S809. Fig. 28 is a flow 
chart illustrating the process of generating a response 
to the user. More specifically, in step S2B01 , the status 
so of the execution result is acquired. Then, in step S2802, 
the response to be output to the user is analyzed. Instep 
S2B03, a response to the user is actually generated by 
utilizing the knowledge of the system, and in step 
S2B04, an actual sentence is created. In step S2805, 
ss the optimal output method is selected, and in step 
S2806, the response is output to the user. For example, 
a confirmation message, such as "Mail has been sent 
to May", may be output to the user In sound by perform- 
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ing speech synthesizing. 
Second Embodiment 

[0087] A description is given below of the flow of the 
processing performed on a speech input, such as "Send 
mail to May" when a syllable dictionary, a word diction- 
ary, a language dictionary, and a concept dictionary are 
provided, as illustrated in Figs. 25, 26, 29, and 30, re- 
spectively. A parameter setting procedure in the recog- 
nition processing is also discussed. It is now assumed 
that M number of words are registered in the word dic- 
tionary, and each word is divided into R[i] number of syl- 
lables, as shown in Fig. 32. 

[0088] Upon the start of the system, the screen, such 
as that shown in Fig. 24, appears. The parametersetting 
procedure is indicated by the flow chart of Fig. 23. In 
step S2301 , parameters for setting the speech recogni- 
tion engine, the Garbage level, the Rejection Time, and 
the Minimum Speech Duration, etc. are indicated. In 
step S2302, a reference of the similarity is indicated as 
a condition for determining the recognition result. There- 
after, in step S2303, the content of the currently activat- 
ed context is indicated. 

[0089] When the speech recognition method is em- 
ployed, in which words are inferred from the result of 
syllable recognition in response to a speech input "Send 
mail to May", the following processing is performed. Up- 
on receiving sound information "send", the sound infor- 
mation is first processed in step S1101 of Fig. 11 . If it is 
determined in step S1102 that the type of processed in- 
formation is a syllable, syllable-recognition processing 
is performed in step S1104. Then, speech recognition 
is conducted in units of syllables by using the speech 
recognition engine in step S1801 of Fig. 18, thereby ob- 
taining the top N syllables as the recognition results. Re- 
ferring back to Fig. 23, in step S2304, the recognition 
results obtained by the speech recognition engine and 
the similarity are acquired, and in step S2305, the ob- 
tained information, i.e., the recognition results in units 
of syllables and the similarity, is indicated according to 
the highest degree. In step S2306, the recognition re- 
sults output from the speech recognition engine are de- 
termined, and the results are output after a result of the 
determination processing. 

[0090] In the determination processing in step S2306, 
the results of the speech recognition engine are ob- 
tained in step S1B02 of Fig. 18. In response to a speech 
input, i:e., "send", the result, such as that shown in Fig. 
33, is obtained. By utilizing this result, syllable-to-word 
matching processing is performed in step S1803. More 
specifically, in step S1901 , matching processing is per- 
formed to determine a suitable word from the syllables. 
In this embodiment, a dynamic programming (DP) 
matching method is used as the syllable-to-word match- 
ing method. In this DP matching method, matching is 
made between the word dictionary and the input speech 
according to the algorithm illustrated in Fig. 34. 
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[0091] The flow of the DP matching processing is 
shown in Fig. 20. In step S2001 , the number of top syl- 
lables N and the number of syllables T recognized so 
far (recognition time) are acquired. Fig. 35 reveals that 

5 N is 5 and T is 0. Then, in step S2002, the number of 
words M registered in the system and the number of syl- 
lables R[i] forming each word are acquired. Fig. 36 re- 
veals that M is 3, and R[1], R[2], and R[3] are 2, 2, and 
4, respectively. I n step S2003, i is set to be 1 . Then, while 

'o the condition, i.e., i s M, set in step S2004 is satisfied, 
the following processing is repeated. When it is deter- 
mined in step S2004 that i is 1 , the process proceeds to 
step S2005 in which the dictionary word W[i], i.e., 
"send", is obtained. In step S2006, j is set to be 1 . Then, 

15 while the condition, i.e., j < R[1](=2), is satisfied, the 
processing for obtaining syllables forming "send" is re- 
peated. Then, in step S2008, S[1][1] = "se" is obtained. 
[0092] Subsequently, in step S21 01 of Fig. 21 , k is set 
to be 1 . While the condition, i.e., k< N(=5), is met instep 

20 S2102, it is determined whether S[1][1] is returned as a 
syllable that may match the input information. When it 
is determined in step S2102 that k is 1, the recognition 
result C[k] and the similarity A[k] are acquired in step 
S21 03, resulting in C[1 ] = "nd" and A[1 ] = 60.4. When it 

25 is determined in step S2105 that S[1][1] * C[1], k is in- 
cremented by one in step S2104, and a subsequent syl- 
lable that may match the input information is obtained 
and determined. When C[3] = "se" and A[3] = 38.9, the 
outcome of step S2105 is yes. Thus, the process pro- 

30 ceeds to step S21 06 in which the similarity D[1 ][1 ] of the 
syllable S[1][1] = A[3] is set to be 38.9. Then, in step 
S21 08, the similarity CD[1 ][1 ] of the word W[1 ] = "send" 
is calculated. 

[0093] Referring to Fig. 22, it is then determined in 

as step 52201 whetherT is 0. In this case, since T is 0, the 
optimal path is calculated in step S221 1 according to the 
equations illustrated in Fig. 34. That is, it is set in step 
S221 1 that P1 = 1 , P2 = 2 * 60.4 = 1 20.8, and P3 = 0. It 
is thus determined in step 52212 that P2 = 120.8 is the 

*o optimal path. In step S2213, the cumulative similarity 
and the cumulative path are calculated. 
[0094] In this case, since the optimal path is 2, the 
cumulative similarity CD[1][1] is 120.8 and the cumula- 
tive path is 2. Referring back to Fig. 21 , j is incremented 

45 by one, i.e., j = 2, in step 52109. Then, the recognition 
result CD[1] = "nd" and the similarity A[1] = 61.0 are ac- 
quired in step S2103. Accordingly, the determination of 
step 52105 becomes true. The above-described calcu- 
lations are then made in step 521 08, and the result CD 

so [1][2] = 122.0 is obtained. Thereafter, j is incremented 
by one, i.e., j = 3 in step 52109, and the determination 
of step S2007 becomes false. Accordingly, i is incre- 
mented by one, i = 2, in step S2009. In this manner, the 
processing is repeated until i becomes 3. As a result, 

55 the similarity of words CD[1][1] = 120.8, CD[1][2] = 
122.0, CD[2][1] = 107.4, CD[2][2] = 41.2, CD[3][1] = 
58.2, CD[3][2] = 0, CD[3][3] = 0, and CD[3][4] = 0 are 
obtained. When i becomes 4 in step S2009, the deter- 
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mination of step S2004 becomes false. Thus, the 
processing is completed, and the word-recognition re- 
sult is subsequently determined in step S1 902 of Fig. 1 9. 
[0095] In step S1902, the result of syllable-to-word 
matching obtained when T is 0 is determined. According s 
to the calculations conducted as described above, the 
exact word that may match the input sound has not yet 
been determined, and the system waits for a subse- 
quent input. Upon receiving the subsequent input, the 
type of sound is determined in step S1102 of Fig. 11. In 10 
this case, since the previous input is a syllable and has 
not yet been recognized as a word, it is determined that 
the type of subsequent input may be again a syllable. 
The syllable-recognition processing is then performed 
in step S1104, and the syllable recognition results, such »s 
as those illustrated in Fig. 35, are returned in step 
S1B02. 

[0096] The recognition results are thus obtained in 
step S1 802, and syllable-to-word matching is conducted 
in step S1803. In step S2001 of Fig. 20, the number of 20 
top syllables N obtained as the recognition results and 
the recognition time T are acquired. That is, N=1 and 
T=1 are obtained. As in the case of the processing ex- 
ecuted when T=0, steps S2002 through S2109 are ex- 
ecuted. As a result, the calculation results CD[1][1] = 2s 
120.8, CD[1][2] = 322.0, CD[2][1] = 107.4, CD[2][2] = 
41.2, CD[3][1] = 58.2, CD[3][2] = 0, CD[3][3] = 0, and 
CD[3][4] = 0 are obtained. When i becomes 4 in step 
S2009, the determination of step S2004 becomes false, 
and the process proceeds to step S1902 in which the 30 
word-recognition result is determined. 
[0097] According to the determination in step S1 902, 
the word "send" is set to be the recognition result ob- 
tained by performing the recognition processing in step 
S805 of Fig. 8. It is determined in step S807 that the 35 
word "send" is to be accepted after analyzing the proc- 
essed result in step S806. 

[0098] According to the foregoing description, the 
speech input "Send mail to May" is first processed, and 
mail is transmitted to May, and finally, a response is out- *o 
put to the user. The whole processing is then completed. 
[0099] Predictions conducted in units of a short 
speech unit, for example, a syllable, are effective par- 
ticularly when only part of the whole speech was recog- 
nized. For exampte, even if input sound forming a word *s 
was not completely recognized, highly precise recogni- 
tion can be expected if predictions made by utilizing the 
knowledge base are used in combination with occur- 
rence probability of each unit and the inter-state transi- 
tion probability. so 
[0100] In outputting the recognized result, the deter- 
mination of step S2307 is made based on the analysis 
of the processed result in step S806. If it is determined 
in step S2307 that the recognition result "send" is ac- 
cepted, it is output as the final recognition result in step 55 
S2308. Similarly, the aforementioned processing is per- 
formed on a subsequent input. If there is a change of 
the parameters on the screen shown in Fig. 24, the new- 
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ly set parameters are made valid, and determinations 
are made by using the new parameters. 

Third Embodiment 

[0101] According to the first and second embodi- 
ments, English-language speech recognition is per- 
formed. However, the Japanese language may be rec- 
ognized based on predictions, and the corresponding 
processing may be appropriately executed by conduct- 
ing concept analyses. In this case, language information 
is provided as a dictionary, and concept analyses are 
not dependent upon the type of language. Thus, differ- 
ences in languages, such as English and Japanese, do 
not influence speech recognition by utilizing concept 
analyses. 

Fourth Embodiment 

[0102] According to the prediction technique shown 
in Fig. 13, not only speech information, which is likely to 
be input, but also information which is unlikely to be in- 
put, is predicted. Thus, the fact that exactly the same 
information is not input consecutively may be used for 
predictions so as to eliminate information which is un- 
likely to be input, thereby enhancing the recognition ef- 
ficiency. 

[0103] As is seen from the foregoing description, the 
present invention offers the advantage of improving the 
recognition accuracy by performing speech recognition 
based on predictions. 

[0104] The present invention may be applied to a sin- 
gle apparatus or to a system formed of a plurality of ap- 
paratuses. 

[0105] In another embodiment of the present inven- 
tion, software program code for implementing the 
above-described functions may be supplied to an appa- 
ratus or a system, and a computer within the apparatus 
or the system may read the program code stored in a 
storage medium and execute it, so that the above-de- 
scribed functions can be implemented. 
[0106] The function of the foregoing embodiment can 
be implemented not only by running the program code 
read by the computer, but also by executing the process- 
ing by, for example, an operating system (OS), running 
in the computer according to the instructions of the pro- 
gram code. 

[0107] According to the above-described modifica- 
tions, a storage medium storing the program code con- 
stitutes the present invention. 

[0108] Although the present invention has been de- 
scribed in its preferred form with a certain degree of par- 
ticularity, many apparently widely different embodi- 
ments of the invention can be made without departing 
from the scope of the claims. It is to be understood that 
the invention is not limited to the specific embodiments 
thereof except as defined in the appended claims. 
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A speech information processing apparatus com- 
prising: 

a context base (207) for storing prediction in- 
formation representing a category of words to 
be subsequently recognised; 
recognition means (601 ) for recognising a word 
based on the category of words represented by 
the prediction information stored in said context 
base (207); 

a knowledge base (208) for storing knowledge 
concerning a category of speech information; 
prediction means for predicting the category of 
words to be subsequently recognised based on 
at least one previously recognised word by re- 
ferring to the knowledge stored in said knowl- 
edge base (208); and 

updating means for updating the prediction in- 
formation stored in said context base (207) 
based on the category of words obtained by 
said prediction means; 

characterised In that: 

the apparatus further comprises determination 
means for determining whether or not input 
sound information is language information by 
referring to the knowledge stored in said knowl- 
edge base (208); and in that 

said recognition means (601) is arranged 
to perform the recognition on the input sound 
information when said determination means 
determines that the input sound information is 
language information. 

A speech information processing apparatus ac- 
cording to claim 1, wherein said prediction means 
is operable to further predict a category of words 
which is less likely to be subsequently recognised 
and wherein said recognition means (601) is ar- 
ranged to reject the category of words which is less 
likely to be subsequently recognised. 

A speech information processing apparatus ac- 
cording to claim 1 or 2, further comprising correct- 
ness determination means for determining the cor- 
rectness of a recognition result obtained by said 
recognition means (601), and wherein said predic- 
tion means is arranged to perform a prediction 
based on a determination result obtained by said 
s determination means. 



4. A speech information processing apparatus ac- 
cording to claim 3, further comprising correction 
means for correcting the recognition result based 
on a determination result obtained by said correct- 



ness determination means, and wherein said pre- 
diction means is arranged to perform a re-prediction 
based on the recognition result corrected by said 



5. A speech Information processing apparatus ac- 
cording to claim 4, wherein said correction means 
is arranged to correct a new recognition result by 
referring to the knowledge stored in said knowledge 
base (208) based on a previous recognition result. 

6. A speech information processing apparatus ac- 
cording to any preceding claim, further comprising 
sound determining means for determining whether 
or not sound information is speech information pro- 
duced by a human, and wherein said determination 
means is arranged to perform a determination on 
the input sound information determined by said 
sound determining means. 

7. A speech information processing apparatus ac- 
cording to claim 6, wherein said sound determining 
means is arranged to distinguish human speech 
from a mechanical sound based on a frequency dif- 
ference. 

8. A speech information processing apparatus ac- 
cording to any preceding claim, wherein said rec- 
ognition means (601) is arranged to recognise a 
word in units of one of: words, syllables and pho- 

9. A speech information processing apparatus ac- 
cording to claim 8, wherein said prediction means 
is arranged to predict a word including one of a syl- 
lable and a phoneme to be subsequently recog- 
nised when said recognition means (601) is ar- 
ranged to recognise words in units of one of: sylla- 
bles and phonemes, respectively, and wherein said 
prediction means is arranged to predict one of: the 
syllable or the phoneme to be subsequently recog- 
nised based on the category of words represented 
by the prediction information. 

10. A speech information processing apparatus ac- 
cording to claim 8 or 9, further comprising selection 
means for selecting the unit for recognition accord- 
ing to whether or not a previous recognition result 
was successfully obtained. 

11. A speech information processing apparatus ac- 
cording to any preceding claim, wherein said pre- 
diction means is arranged to predict the category of 
words to be subsequently recognised based on a 
previous operation. 

12. A speech information processing apparatus ac- 
cording to claim 11 , wherein said prediction means 
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26 



is arranged to predict a subsequent operation 
based on the previous operation and to predict the 
category of words to be subsequently recognised 
based on the predicted subsequent operation. 

13. A speech information processing apparatus ac- 
cording to claim 11 or 12, wherein said prediction 
means is arranged to predict the category of words 
to be initially recognised based on the previous op- 
eration. 

14. A speech information processing method compris- 
ing: 

a recognition step (S805) of recognising a word 
based on a category of words to be subse- 
quently recognised which is represented by 
prediction information stored in a context base 
(207); 

a prediction step (S904) of predicting the cate- 
gory of words to be subsequently recognised 
based on at least one previously recognised 
word by referring to knowledge concerning a 
category of speech information stored in a 
knowledge base (208); and 
an updating step (S905) of updating the predic- 
tion information stored in said context base 
(207) based on the category of words obtained 
in said prediction step; 

characterised by a determination step 
(S1108) of determining whether or not input sound 
information is language information by referring to 
the knowledge stored in said knowledge base 
(208); and in that said recognition step (S805) per- 
forms the recognition on the input sound informa- 
tion when the input sound information is determined 
to be language information in said determination 
step (S1108). 

1 5. A speech information processing method according 
to claim 1 4, wherein said prediction step (S904) fur- 
ther predicts a category of words which is less likely 
to be subsequently recognised and wherein said 
recognition step (S805) rejects the category of 
words which is less likely to be subsequently rec- 
ognised. 

1 6. A speech information processing method according 
to claim 14 or 15, further comprising a correctness 
determination step (S902) of determining the cor- 
rectness of a recognition result obtained in said rec- 
ognition step (S805), and wherein said prediction 
step (S904) performs a prediction based on a de- 
termination result obtained in said correctness de- 
termination step. 

1 7. A speech information processing method according 



to claim 16, further comprising a correction step 
(S903) of correcting the recognition result based on 
a determination result obtained in said correctness 
determination step (S902), and wherein said pre- 
diction step (S904) performs a re-prediction based 
on the recognition result corrected in said correction 
step (S903). 

18. A speech information processing method according 
to claim 1 7, wherein said correction step (S903) cor- 
rects a new recognition result by referring to the 
knowledge stored in said knowledge base (208) 
based on a previous recognition result. 

19. A speech information processing method according 
to any of claims 1 4 to 1 8, further comprising a sound 
determining step (S1202) of determining whetheror 
not sound information is speech information pro- 
duced by a human, and wherein said determination 
step performs a determination on the input sound 
information determined in said sound determining 

20. A speech information processing method according 
to claim 19, wherein said sound determining step 
(S1108) distinguishes human speech from a me- 
chanical sound based on a frequency difference. 

21 . A speech information processing method according 
to any of claims 14 to 20, wherein said recognition 
step (S805) recognises a word in units of one of; 
words, syllables and phor 



22. A speech information processing method according 
35 to claim 21 , wherein said prediction step (S904) pre- 
dicts a word including one of: a syllable and a pho- 
neme to be subsequently recognised when said 
recognition step (S805) uses one of: syllables and 
phonemes, respectively, as a unit of recognition, 

40 and predicts one of the syllable or the phoneme to 
be subsequently recognised based on the category 
of words represented by the prediction information. 

23. A speech information processing method according 
*s to claim 21 or 22, further comprising a selection step 

of selecting the unit of recognition according to 
whether or not a previous recognition result was 
successfully obtained. 

so 24. A speech information processing method according 
to any of claims 14 to 23, wherein said prediction 
step (S904) predicts the category of words to be 
subsequently recognised based on a previous op- 
eration. 

25. A speech information processing method according 
to claim 24, wherein said prediction step (S904) pre- 
dicts a subsequent operation based on the previous 
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operation and predicts the category of words to be 
subsequently recognised based on the predicted 
subsequent operation. 

26. A speech information processing method according 
to claim 24 or 25, wherein said prediction step pre- 
dicts the category of words to be initially recognised 
based on the previous operation. 

27. A storage medium storing processor implementable 
instructions for controlling a processor to implement 
the method of any of claims 14 to 26. 

28. Processor implementable instructions for control- 
ling a processor to implement the method of any of 
claims 14 to 26. 



Pate ntansp rOc he 

1. Sprachinformationsverarbeitungsvorrichtung mit: 

einer Kontextbasis (207) zum Speichern von ei- 
ne Kategorie von anschlieBend zu erkennen- 
den Wortern darstellenden Vorhersageinfor- 25 



i, ausgelegt is 



einer Erkennungseinrichtung (601 ) zum Erken- 
nen eines Worts basierend auf der durch die in 
der Kontextbasis (207) gespeicherten Vorher- 
sageinformationen dargestellten Kategorie von 30 
Wortern; 

einer Wissensbasis (208) zum Speichern von 
eine Kategorie von Sprachinformationen be- 
treffendem Wissen; 

einer Vorhersageeinrichtung zum Vorhersagen 35 
der Kategorie von anschlieBend zu erkennen- 
den Wortern basierend auf zumindest einem 
vorher erkannten Wort mit Bezug auf das in der 
Wissensbasis (208) gespeicherte Wissen; und 
einer Aktualisierungseinrichtung zum Aktuali- *o 
sieren der In der Kontextbasis (207) gespei- 
cherten Vorhersageinformationen basierend 
auf der durch die Vorhersageeinrichtung erhal- 
tenen Kategorie von Wortern; 

45 

dadurch gekennzeichnet, dass: 

die Vorrichtung ferner eine Bestiminungsein- 
richtung zum Bestimmen, ob eingegebene 
Toninformationen Sprachinformationen sind so 
oder nicht, mit Bezug auf das in der Wissens- 
basis (208) gespeicherte Wissen umfasst; und 

die Erkennungseinrichtung (601) zum 
Durchfuhren der Erkennung bei den eingege- 55 
benen Toninformationen, wenn die Bestim- 
mungseinrichtung bestimmt, dass die eingege- 
benen Toninformationen Sprachinformationen 



Sprachinformationsverarbeitungsvorrichtung nach 
Anspruch 1 , wobei die Vorhersageeinrichtung dazu 
betreibbar ist, ferner eine Kategorie von Wortern, 
deren anschlieOende Erkennung weniger wahr- 
scheinlich ist, vorherzusagen, und wobei die Erken- 
nungseinrichtung (601) zum Ablehnen der Katego- 
rie von Wortern, deren anschlieBende Erkennung 
weniger wahrscheinlich ist, ausgelegt ist. 

Sprachinformationsverarbeitungsvorrichtung nach 
• Anspruch 1 oder 2, ferner mit einer Richtigkeitsbe- 
stimmungseinrichtung zum Bestimmen der Richtig- 
keit eines durch die Erkennungseinrichtung (601) 
erhaltenen Erkennungsergebnisses, und wobei die 
Vorhersageeinrichtung zum Durchfuhren einer Vor- 
hersage basierend auf einem durch die Richtig- 
keitsbestimmungseinrichtung erhaltenen Bestim- 
mungsergebnis ausgelegt ist. 

Sprachinformationsverarbeitungsvorrichtung nach 
Anspruch 3, ferner mit einer Berichtigungseinrich- 
tung zum Berichtigen des Erkennungsergebnisses 
basierend auf einem durch die Richtigkeitsbestim- 
mungseinrichtung erhaltenen Bestimmungsergeb- 
nis, und wobei die Vorhersageinrichtung zum 
Durchfuhren einer Neuvorhersage basierend auf 
dem durch die Berichtigungseinrichtung berichtig- 
ten Erkennungsergebnis ausgelegt ist. 

Sprachinformationsverarbeitungsvorrichtung nach 
Anspruch 4, wobei die Berichtigungseinrichtung 
zum Berichtigen eines neuen Erkennungsergebnis- 
ses mit Bezug auf das in der Wissensbasis (208) 
gespeicherte Wissen basierend auf einem vorher- 
gehenden Erkennungsergebnis ausgelegt ist. 

Sprachinforrnationsverarbeitungsvorrichtung nach 
einem der vorstehenden Anspruche, ferner mit ei- 
ner Tonbestimmungseinrichtung zum Bestimmen, 
ob Toninformationen durch einen Menschen er- 
zeugte Sprachinformationen sind oder nicht, und 
wobei die Bestimmungseinrichtung zum Durchfuh- 
ren einer Bestimmung bei den durch die Tonbestim- 
mungseinrichtung bestimmten eingegebenen 
Toninformationen ausgelegt ist. 

Sprachinformationsverarbeitungsvorrichtung nach 
Anspruch 6, wobei die Tonbestimmungseinrichtung 
zum Unterscheiden von menschlicher Sprache von 
einem mechanischen Ton basierend auf einem Fre- 
quenzunterschied ausgelegt ist. 

Sprachinformationsverarbeitungsvorrichtung nach 
einem der vorstehenden Anspruche, wobei die Er- 
kennungseinrichtung (601) zum Erkennen eines 
Worts in Einheiten von einem der Nachstehenden 
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ausgelegt ist: Worter, Silben und Phoneme. 

9. Sprachinformationsverarbeitungsvorrichtung nach 
Anspruch 8, wobei die Vorhersageeinrichtung zum 
Vorhersagen eines Worts einschlieBlich einerSilbe 
Oder eines Phonems, die oder das anschlieBend zu 
erkennen ist, ausgelegt ist, wenn die Erkennungs- 
einrichtung (601) zum Erkennen von Wortern in 
Einheiten eines der Nachstehenden ausgelegt ist: 
Silben bzw. Phoneme, und wobei die Vorhersage- 
einrichtung zum Vorhersagen eines der Nachste- 
henden ausgelegt ist: der Silbe Oder des Phonems, 
die Oder das anschlieBend basierend auf der durch 
die Vorhersageinformationen dargestellten Katego- 
rie von Wortern zu erkennen ist. 

10. Sprachinformationsverarbeitungsvorrichtung nach 
Anspruch 8 oder 9, fernermit einer Auswahleinrich- 
tung zum Auswahlen der Einheit zur Erkennung ge- 
maB dem, ob ein vorhergehendes Erkennungser- 
gebnis erfolgreich erhalten wurde oder nicht. 

11. Sprachintormationsverarbeitungsvorrichtung nach 
einem der vorstehenden Anspriiche, wobei die Vor- 
hersageeinrichtung zum Vorhersagen der Katego- 
rie von anschlieBend zu erkennenden Wortern ba- 
sierend auf einer vorhergehenden Operation aus- 
gelegt ist. 

12. Sprachinformationsverarbeitungsvorrichtung nach 
Anspruch 1 1 , wobei die Vorhersageeinrichtung zum 
Vorhersagen einer nachfolgenden Operation basie- 
rend auf der vorhergehenden Operation und zum 
Vorhersagen der Kategorie von anschlieBend zu er- 
kennenden Wortern basierend auf der vorherge- 
sagten nachfolgenden Operation ausgelegt ist. 

13. Sprachinformationsverarbeitungsvorrichtung nach 
Anspruch 1 1 oder 1 2, wobei die Vorhersageeinrich- 
tung zum Vorhersagen der Kategorie von anfang- 
lich zu erkennenden Wortern basierend auf der vor- 
hergehenden Operation ausgelegt ist. 

14. Sprachinformationsverarbeitungsverfahren mit: 

einem Erkennungsschritt (S805) des Erken- 
nens eines Worts basierend auf einer Katego- 
rie von anschlieBend zu erkennenden Wortern, 
die durch in einer Kontextbasis (207) gespei- 
cherte Vorhersageinformationen dargestellt 
wird; 

einem Vorhersageschritt (S904) des Vorhersa- 
gens der Kategorie von anschlieBend zu erken- 
nenden Wortern basierend auf zumindest ei- 
nem vorher erkannten Wort mit Bezug auf eine 
Kategorie von Sprachinformationen betreffen- 
des Wissen, das in einer Wissensbasls (208) 
gespeichert ist; und 
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einem Aktualisierungsschritt(S905) des Aktua- 
lisierens der in der Kontextbasis (207) gespei- 
cherten Vorhersageinformationen basierend 
auf der in dem Vorhersageschritt erhaltenen 
Kategorie von Wortern; 

gekennzeichnet durch einen Bestimmungs- 
schritt (S1108) des Bestimmens, ob eingegebene 
Toninformationen Sprachinformationen sind oder 
' nicht, mit Bezug auf das in der Wissensbasis (208) 
gespeicherte Wissen; und dadurch, dass der Er- 
kennungsschritt (S805) die Erkennung bei den ein- 
gegebenen Toninformationen durchfiihrt, wenn in 
dem Bestimmungsschritt (S1108) bestimmt wird, 

> dass die eingegebenen Toninformationen Spra- 
chinformationen sind. 

15. Sprachinformationsverarbeitungsverfahren nach 
Anspruch 14, wobei der Vorhersageschritt (S904) 

» ferner eine Kategorie von Wortern, deren anschlie- 
Bende Erkennung weniger wahrscheinlich ist, vor- 
hersagt, und wobei der Erkennungsschritt (S805) 
die Kategorie von Wortern, deren anschlieBende 
Erkennung weniger wahrscheinlich ist, ablehnt. 

16. Sprachinformationsverarbeitungsverfahren nach 
Anspruch 14 oder 15, fernermit einem Richtigkeits- 
bestimmungsschritt (S902) des Bestimmens der 
Richtigkeit eines in dem Erkennungsschritt (S805) 

> erhaltenen Erkennungsergebnisses, und wobei der 
Vorhersageschritt (S904) eine Vorhersage basie- 
rend auf einem in dem Richtigkeitsbestimmungs- 
schritt erhaltenen Bestimmungsergebnis durch- 
fiihrt. 

17. Sprachinformationsverarbeitungsverfahren nach 
Anspruch 1 6, ferner mit einem Berichtigungsschritt 
(S903) des Berichtigens des Erkennungsergebnis- 
ses basierend auf einem in dem Richtigkeitsbestim- 

' mungsschritt (S902) erhaltenen Bestimmungser- 
gebnis, und wobei der Vorhersageschritt (S904) ei- 
ne Neuvorhersage basierend auf dem in dem Be- 
richtigungsschritt (S903) berichtigten Erkennungs- 
ergebnis durchfiihrt. 

18. Sprachinformationsverarbeitungsverfahren nach 
Anspruch 1 7, wobei der Berichtigungsschritt (S903) 
ein neues Erkennungsergebnis mit Bezug auf das 
in der Wissensbasis (208) gespeicherte Wissen ba- 

' sierend auf einem vorhergehenden Erkennungser- 
gebnis berichtigt. 

19. Sprachinformationsverarbaitungsverfahren nach 
einem der Anspruche 14 bis 18, ferner mit einem 

s Tonbestimmungsschritt (S1202) des Bestimmens, 
ob Toninformationen durch einen Menschen er- 
zeugte Sprachinformationen sind oder nicht, und 
wobei der Bestimmungsschritt eine Bestimmung 
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bei den in dem Tonbestimmungsschritt bestimmten 
eingegebenen Toninformationen durchfiihrt. 

20. Sprachinformationsverarbeituhgsverfahren nach 
Anspruch 19, woboi der Tonbestimmungsschritt 
(S1108) basierend auf einem Frequenzunterschied 
menschliche Sprache von einem mechanischen 
Ton unterscheidet. 

21. Sprachintormationsverarbeitungsverfahren nach 
einem der Anspriiche 14 bis 20, wobei der Erken- 
nungsschritt (S805) ein Wort in Einheiten eines der 
Nachstehenden erkennt: Worter, Silben und Pho- 



22. Sprachinformationsverarbeltungsverfahren nach 
Anspruch 21 , wobei der Vorhersageschritt (S904) 
ein Wort einschlieBlich eines der Nachstehenden 
vorhersagt: einer Silbe und eines Phonems, die 
Oder das anschlieBend zu erkennen ist, wenn der 
Erkennungsschritt (S805) eines der Nachstehen- 
den verwendet: Silben bzw. Phoneme als eine Ein- 
heit der Erkennung, und die Silbe oder das Pho- 
nem, die oder das anschlieBend zu erkennen ist, 
basierend auf der durch die Vorhersageinformatio- 
nen dargestellten Kategorie von Wortern vorher- 
sagt. 

23. Sprachinformationsverarbeitungsverfahren nach 
Anspruch 21 oder 22, ferner mit einem Auswahl- 
schritt des Auswahlens der Einheit der Erkennung 
gemaB dem, ob ein vorhergehendes Erkennungs- 
ergebnis erfolgreich erhalten wurde oder nicht. 

24. Sprachinformationsverarbeitungsverfahren nach 
einem der Anspriiche 1 4 bis 23, wobei der Vorher- 
sageschritt (S904) die Kategorie von anschlieBend 
zu erkennenden Wortern basierend auf einer vor- 
hergehenden Operation vorhersagt. 

25. Sprachinformationsverarbeitungsverfahren nach 
Anspruch 24, wobei der Vorhersageschritt (S904) 
eine nachfolgende Operation basierend auf der vor- 
hergehenden Operation vorhersagt und die Kate- 
gorie von anschlieBend zu erkennenden Wortern 
basierend auf der vorhergesagten nachfolgenden 
Operation vorhersagt. 

26. Sprachinformationsverarbeitungsverfahren nach 
Anspruch 24 oder 25, wobei der Vorhersageschritt 
die Kategorie von anfanglich zu erkennenden Wor- 
tern basierend auf der vorhergehenden Operation 
vorhersagt. 

27. Speichermedium, das durch eine Verarbeitungsein- 
richtung ausfuhrbare Anweisungen zumSteuern ei- 
ner Verarbeitungseinrichtung zum Ausfuhren des 
Verfahrens nach einem der Anspriiche 14 bis 26 



speichert. 

28. Durch eine Verarbeitungseinrichtung ausfuhrbare 
Anweisungen zumSteuern einer Verarbeitungsein- 
richtung zum Ausfuhren des Verfahrens nach ei- 
nem der Anspriiche 1 4 bis 26. 



Revendlcations 



une base (207) de contexte destinee a stocker 
une information de prediction representant une 
categorie de mots devant etre ensuite 



un moyen (601) de reconnaissance destine a 
reconnaTtre un mot base sur la categorie de 
mots representee par I'information de predic- 
tion stockee dans ladite base (207) de 



une base (208) de connaissances destinee a 
stocker des connaissances concernant une ca- 
tegorie d'informations vocales ; 
un moyen de prediction destine a predire la ca- 
tegorie de mots devant etre ensuite reconnue 
sur la base d'au moins un mot precedemment 
reconnu en se referant aux connaissances 
stockees dans ladite base (208) de 



un moyen de mise a jour destine a mettre a jour 
I'information de prediction stockee dans ladite 
base (207) de contexte sur la base de la cate- 
gorie de mots obtenue par ledit moyen de 
prediction ; 

caractcrlsc en cc que : 

I'appareil comporte en outre un moyen de de- 
termination destine a determiner si une infor- 
mation sonore d'entree est ou non une informa- 
tion de langage en se referant aux connaissan- 
ces stockees dans ladite base (208) de 



ledit moyen (601) de reconnaissance est agen- 
ce pour effectuer la reconnaissance de I'infor- 
mation sonore d'entree lorsque ledit moyen de 
determination determine que I'information so- 
nore d'entree est une information de langage. 

Appareil de traitement d'informations vocales selon 
la revendication 1 , dans lequel ledit moyen de pre- 
diction peut etre mis en oeuvre pour predire en 
outre une categorie de mots qui est moins suscep- 
tible d'etre ensuite reconnue et dans lequel ledit 
moyen de reconnaissance (601) est agence de fa- 
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con a rejeter la categorie de mots qui e 
ceptible d'etre ensuite re 



Appareil de traitement d'informations vocales selon 
la revendication 1 ou 2, comportant en outre un 
moyen de determination d'exactitude destine a de- 
terminer I'exactitude d'un resultat de re 
ce obtenu par ledit moyen c 
(601), et dans lequel ledit moyen de prediction est 
agence pour effectuer une prediction sur la base 
d'un r6sultat de determination obtenu par ledit 
moyen de determination d'exactitude. 



4. Appareil de traitement d'informations vocales selon 
la revendication 3, comportant en outre un moyen 
de correction destine a corriger le resultat de la re- 
connaissance sur la base d'un resultat de determi- 
nation obtenu par ledit moyen de determination 
d'exactitude, et dans lequel ledit moyen de predic- 
tion est agence pour effectuer une nouvelle predic- 
tion sur la base du resultat de la reconnaissance 
corrige par ledit moyen de correction. 

5. Appareil de traitement d'informations vocales selon 
la revendication 4, dans lequel ledit moyen de cor- 
rection est agence pour corriger un nouveau resul- 
tat de reconnaissance en se referant aux connais- 
sances stockees dans ladite base (208) de connals- 
sances sur la base d'un resultat de reconnaissance 
precedent. 

6. Appareil de traitement d'informations vocales selon 
I'une- quelconque des revendications precedentes, 
comportant en outre un moyen de determination de 
sons destine a determiner si une information sonore 
est ou non une information vocale produite par un 
etre humain, et dans lequel ledit moyen de determi- 
nation est agence pour effectuer une determination 
sur 1'information de sons d'entree determined -par 
ledit moyen de determination de sons. 

7. Appareil de traitement d'informations vocales selon 
la revendication 6, dans lequel ledit moyen de de- 
termination de sons est agence pour distinguer une 
parole humaine d'un son mecanique sur la base 
d'une difference de frequence. 

8. Appareil de traitement d'informations vocales selon 
I'une quelconque des revendications precedentes, 
dans lequel ledit moyen de reconnaissance (601) 
est agence pour reconnattre un mot en unites de 
I'un de : mots, syllabes et phonemes. 

9. Appareil de traitement d'informations vocales selon 
la revendication 8, dans lequel ledit moyen de pre- 
diction est agence pour predire un mot comprenant 
I'un d'une syllabe et d'un phoneme devant etre en- 
suite reconnu lorsque ledit moyen de reconnaissan- 



ce (601) est agence pour reconnaTtre des mots en 
unites de I'un de : syllabes et phonemes, respecti- 
vement, et dans lequel ledit moyen de prediction est 
agence pour predire I'un de : la syllabe ou le pho- 
neme devant etre ensuite reconnu sur la base de la 
categorie de mots representee par I'information de 



1 0. Appareil de traitement d'informations vocales selon 
la revendication 8 ou 9, comportant en outre un 
moyen de selection destine a selectionner I'unite 
pour la reconnaissance selon qu'un resultat de re- 
e precedent a ete obtenu avec succes 



1 1 . Appareil de traitement d'informations vocales selon 
I'une quelconque des revendications precedentes, 
dans lequei ledit moyen de prediction est agence 
pour predire la categorie de mots devant etre en- 
suite reconnue sur la base d'une operation prece- 

1 2. Appareil de traitement d'informations vocales selon 
la revendication 1 1 , dans lequel ledit moyen de pre- 
diction est agence pour predire une operation sui- 
vante sur la base de I'operation precedente et pour 
predire la categorie de mots devant etre ensuite re- 
connue sur la base de I'operation suivante predite. 



1 3. Appareil de traitement d'informations vocales selon 
la revendication 11 ou 12, dans lequel ledit moyen 
de prediction est agence pour predire la categorie 
de mots devant etre reconnue initialement sur la ba- 
se de I'operation precedente. 

14. Precede de traitement d'informations vocales 
comprenant : 



une etape de reconnaissance (SB05) cor 
tant a reconnaTtre un mot sur la base d'une 
tegorie de mots devant etre ensuite re 
qui est representee par une information de pre- 
diction stockee dans une base (207) de 



une etape de prediction (S904) consistant a 
predire la categorie de mots devant etre ensuite 
reconnue sur la base d'au moins un mot prece- 
demment reconnu en se referant a des con- 
it une categorie d'infor- 
> une base (208) 
de connaissances ; et 

une etape (S905) de mise a jour consistant a 
mettre a jour I'information de prediction stockee 
dans ladite base (207) de contexte sur la base 
de la categorie de mots obtenue dans ladite 
etape de prediction ; 

caracterlse par une etape de determination 
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(S1108) consistant a determiner si une information 
sonore d'entree est ou non une information de lan- 
gue en se referant aux connaissances stockees 
dans ladite base (208) de connaissances ; et en ce 
que ladite etape de reconnaissance (SB05) cffectue 
la reconnaissance sur I'information sonore d'entree 
lorsque ("information sonore d'entree est determi- 
nes comme etant une information de langue dans 
ladite etape de determination (S1108). 

1 5. Procede de traitement d'informations vocales selon 
la revendication 1 4, dans lequel ladite etape de pre- 
diction (S904) predit en outre une categorie de mots 
moins susceptible d'etre ensuite reconnue et dans 
lequel ladite etape de reconnaissance (S805) rejet- 
te la categorie de mots moins susceptible d'etre en- 

1 6. Procede de traitement d'informations vocales selon 
la revendication 1 4 ou 15, comprenant en outre une 
etape (S902) de determination d'exactitude consis- 
tant a determiner I'exactitude d'un resultat de recon- 
naissance obtenu dans ladite etape de reconnais- 
sance (S805), et dans lequel ladite etape de predic- 
tion (S904) effectue une prediction sur la base d'un 
resultat de determination obtenu dans ladite etape 
de determination d'exactitude. 

1 7. Procede de traitement d'informations vocales selon 
la revendication 1 6, comprenant en outre une etape 
de correction (S903) consistant a corriger le resultat 
de reconnaissance base sur un resultat de determi- 
nation obtenu dans ladite etape (S902) de determi- 
nation d'exactitude, et dans lequel ladite etape de 
prediction (S904) effectue une nouvelle prediction 
basee sur le resultat de reconnaissance corrige 
dans ladite etape de correction (S903). 

1 8. Procede de traitement d'informations vocales selon 
la revendication 1 7, dans lequel ladite etape de cor- 
rection (S903) corrige un nouveau resultat de re- 
connaissance en se referant aux connaissances 
stockees dans ladite base (208) de connaissances 
sur la base d'un resultat de reconnaissance prece- 

1 9. Procede de traitement d'informations vocales selon 
Tune quelconque des revendications 14 a 1 8, com- 
prenant en outre une etape (S1202) de determina- 
tion de sons consistant a determiner si une informa- 
tion sonore est ou non une information vocale pro- 
duite par un etre humain, et dans lequel ladite etape 
de determination effectue une determination sur 
I'information sonore d'entree determinee dans ladi- 
te etape de determination de sons. 

20. Procede de traitement d'informations vocales selon 
la revendication 19, dans lequel ladite etape 
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(S1 1 08) de determination de sons distingue une pa- 
role humaine d'un son mecanique sur la base d'une 
difference de frequence. 

s 21. Procede de traitement d'informations vocales selon 
I'une quelconque des revendications 14 a 20, dans 
lequel ladite etape (SB05) de reconnaissance re- 
connait un mot en unites de I'un de : mots, syllabes 
et phonemes. 

22. Procede de traitement d'informations vocales selon 
la revendication 21 , dans lequel ladite etape de pre- 
diction (S904) predit un mot comprenant I'un de : 
une syllabe et un phoneme devant etre ensuite re- 
's connus lorsque ladite etape de reconnaissance 

(SB05) utilise I'un de : syllabes et phonemes, res- 
pectivement, en tant qu'unite de reconnaissance, et 
predit I'un de la syllabe ou du phoneme devant etre 
ensuite reconnu sur la base de la categorie de mots 
so representee par ('information de prediction. 

23. Procede de traitement d'informations vocales selon 
la revendication 21 ou 22, comprenant en outre une 
etape de selection consistant a selectionner I'unite 

25 de reconnaissance selon qu'un resultat de recon- 
naissance precedent a ete obtenu avec succes ou 



24. Procede de traitement d'informations vocales selon 
30 i'une quelconque des revendications 14 a 23, dans 
lequel ladite etape de prediction (S904) predit la ca- 
tegorie de mots devant etre ensuite reconnue sur 
la base d'une operation precedente. 

35 25. Procede de traitement d'informations vocales selon 
la revendication 24, dans lequel ladite etape de pre- 
diction (S904) predit une operation suivante sur la 
base de I'operation precedente et predit la categorie 
de mots devant etre ensuite reconnue sur la base 

40 de I'operation suivante predite. 

26. Procede de traitement d'informations vocales selon 
la revendication 24 ou 25, dans lequel ladite etape 
de prediction predit la categorie de mots devant etre 

ts reconnue initialementsur la base de I'operation pre- 

27. Support de stockage stockant des instructions ex§- 
cutables par un processeur pour commander un 

so processeur afin qu'il execute le procede selon I'une 
quelconque des revendications 14 a 26. 

28. Instructions executables par un processeur pour 
commander un processeur afin qu'il execute le pro- 

55 cede selon I'une quelconque des revendications 1 4 
a 26. 
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